Automatic selection of clustering algorithms using supervised graph embedding
نویسندگان
چکیده
The widespread adoption of machine learning (ML) techniques and the extensive expertise required to apply them have led increased interest in automated ML solutions that reduce need for human intervention. One main challenges applying previously unseen problems is algorithm selection – identification high-performing algorithm(s) a given dataset, task, evaluation measure. This study addresses challenge data clustering, fundamental task mining aimed at grouping similar objects. We present MARCO-GE, novel meta-learning approach recommendation clustering algorithms. MARCO-GE first transforms datasets into graphs then utilizes graph convolutional neural network technique extract their latent representation. Using embedding representations obtained, trains ranking meta-model capable accurately recommending top-performing algorithms new dataset An on 210 datasets, 17 algorithms, 10 measures demonstrates effectiveness our its superiority terms predictive generalization performance over state-of-the-art approaches.
منابع مشابه
Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection
Selecting discriminative features in positive unlabelled (PU) learning tasks is a challenging problem due to lack of negative class information. Traditional supervised and semi-supervised feature selection methods are not able to be applied directly in this scenario, and unsupervised feature selection algorithms are designed to handle unlabelled data while neglecting the available information f...
متن کاملSimultaneous supervised clustering and feature selection over a graph.
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks...
متن کاملSemi-Supervised Clustering Using Genetic Algorithms
A semi-supervised clustering algorithm is proposed that combines the benefits of supervised and unsupervised learning methods. Data are segmented/clustered using an unsupervised learning technique that is biased toward producing segments or clusters as pure as possible in terms of class distribution. These clusters can then be used to predict the class of future points. For example in database ...
متن کاملGraph Clustering with Dynamic Embedding
Graph clustering (or community detection) has long drawn enormous aention from the research on web mining and information networks. Recent literature on this topic has reached a consensus that node contents and link structures should be integrated for reliable graph clustering, especially in an unsupervised setting. However, existing methods based on shallow models oen suer from content nois...
متن کاملAutomatic Graph Clustering
We present a technique and a program for the automatic clustering of graphs. The technique is based on several heuristics, which allows for an eecient implementation on a personal computer. Our approach is capable of clustering graphs with > 3000 vertices eeciently. The demonstration shows an interactive user environment that supports both automatic and user-controlled clustering. As an applica...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Sciences
سال: 2021
ISSN: ['0020-0255', '1872-6291']
DOI: https://doi.org/10.1016/j.ins.2021.08.028